All Questions
Tagged with scikit-learncross-validation
109 questions
1vote
0answers
102views
Confused about use of random states for training models in scikit
I am new to ML and currently working on improving the accuracy of an MLPClassifier in scikit. My code looks like so ...
3votes
1answer
636views
Why does scikit's cross-validation return a negative R^2 for my strongly correlated data
I have exactly the following preprocessed data in a small Pandas dataframe: ...
0votes
1answer
37views
Sklearn EstimatorCV vs GridSearchCV
sklearn has the following description for EstimatorCV estimators: https://scikit-learn.org/stable/glossary.html#term-cross-validation-estimator An estimator that has built-in cross-validation ...
2votes
1answer
31views
Scoring function in cross-validation often left default
I'm a PhD student applying ML in microbiology. In research papers, the usual performance measure reported on classification models is ROC-AUC. But when I look at implementations, the scoring function ...
1vote
1answer
91views
How do I identify overffiting when using GridSchearCV?
For context, I'm using Scikit Learn's GridSearchCV to find the best Hyperparameters of a Decision Tree. I believe I understand Train, Validation, and Test sets and overfitting concepts when applied ...
0votes
0answers
23views
How to use cross validation to select/evaluate model with probability score as the output?
Initially I was evaluating my models using cross_val with out-of-pocket metrics such as precision, recall, f1 score, etc, or with my own metrics defined in ...
1vote
1answer
175views
integration of Feature Selection in Pipeline
I have noticed integrating feature selection in a pipeline alters results. Pipeline 1 gives slightly different results with pipeline 2. Why should this be so? Pipeline 2 ...
0votes
1answer
90views
error when using KFold() and roc_auc metric
why cross_val_score(pipe,X,y,scoring="roc_auc",cv=StratifiedKFold()) works just fine and when using KFold() like ...
0votes
1answer
629views
Tuned model has higher CV accuracy, but a lower test accuracy. Should I use the tuned or untuned model?
I am working on a classification problem using Sci Kit Learn and am confused on how to properly tune hyper parameters to get the "best" model. Before any tuning, my logistic regression ...
0votes
1answer
102views
how do I test if overfitting exists when I use cross_val_score method?
I got the following code form a book on xgboost. I wonder whether this is a correct way of analyzing cross validation score for overfitting purposes. mean accuracy is 81 which can be okay. but what if ...
1vote
1answer
853views
Does sklearn perform feature selection within cross validation?
I would like to add a feature selector on my pipeline and use gridsearchcv to tune both the hyperparameters of the selector and the classifier(s). I am wondering if sklearn performs feature selection ...
0votes
1answer
1kviews
Is there any benefit to using cross validation from the XGBoost library over sklearn when tuning hyperparameters?
The XGBoost library has its own implementation of cross validation through xgboost.cv(). It looks like it requires data be stored as a DMatrix. Instead of using <...
0votes
1answer
87views
Confusion regarding K-fold Cross Validation
In K fold cross validation, we divide the dataset into k folds, where we train the model on k-1 folds and test the model on the remaining fold. We do so until all the folds were assigned as the test ...
0votes
1answer
241views
Can I use scikit learn's cross_val_predict with cross_validate?
I am looking to make a visualization of my cross validation data in which I can visualize the predictions that occurred within the cross validation process. I am using scikit learn's cross_validate to ...
4votes
0answers
92views
Does ROC AUC different between crossval and test set indicate overfitting or other problem?
I am training a composite model (XGBoost, Linear Regression, and RandomForest) to predict injured people probability. Well, the results of cross-validation with 5 folds. Well, I can see any problem ...